Aggression and Complexity in Trump’s Rhetoric

An Analysis of 2020 Presidential Campaign Speeches

Kayla Muller

2025-05-06

Table of Contents

  • Introduction
  • Data
  • Aggression Time Trend
  • Simplicity Time Trend
  • Topic Modeling: Aggression in the 75th percentile
  • Topic Modeling: Simplicity in the 75th percentile
  • Conclusion

Introduction

Political Rhetoric

  • Influences public perception of politicians
  • During presidential elections, rhetoric plays a particularly important role in voter behavior and public opinion.

Research Question:

Is there a correlation between aggressivity and rhetorical complexity in Donald Trump’s 2020 presidential campaign speeches?

How does Trump speak about political events?

Let’s look at a comparison between Obama and Trump.

Data

Chalkiadakis, Ioannis and Anglès d’Auriac, Louise and Peters, Gareth and Frau-Meigs, Divina, A text dataset of campaign speeches of the main tickets in the 2020 US presidential election (September 20, 2024)

  • This analysis uses Trump’s campaign speeches from the 2020 presidential election to assess if, or to what extent, there is a correlation between aggressivity and rhetorical simplicity.
  • The dataset consists of 235 official transcripts of Donald Trump’s speeches throughout his 2020 presidential campaign from January, 2019 through January, 2021.

Monthly Average Aggression Ratio

  • Aggression Dictionary {idiot, ignorant, spite, humiliate, disgrace, …}

\[Aggression Ratio=Number of Aggressive Words/Total Number of Words\]

Monthly Average Aggression Ratio

Aggression in the 75th Percentile

Visualizing a subset consisting of 21 of the most aggressive speeches, with a ratio above 0.206258 in the 75th percentile.

Rhetoric Complexity

Monthly Average Flesch Score

Flesch-Kincaid Reading Ease

  • Readability test, with scores from 0-100.
  • Scores from 60-70 indicate an 8th/9th grade reading level, for reference.

Simplicity in the 75th Percentile

Speeches with a flesch_score above 68.72: Trump’s simplest speeches. The subset for the 75th percentile consists of 59 speeches, of 235 total.

Topic Modeling: Aggression in the 75th Percentile

Latent Dirichlet Allocation (LDA) to identify the top topics in the 75th percentile of aggressive speeches.

LatentDirichletAllocation(n_components=11, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1:  know peopl said want dont say great thing right think
Topic #2:  iran terror nation countri futur year want state sign think
Topic #3:  act nation decis presid militari secur administr author section abil
Topic #4:  presid trump biden want elect said vote peopl pennsylvania dont
Topic #5:  iran unit nuclear iranian regim world state sanction missil weapon
Topic #6:  thank american unit viru state flag action world nation peopl
Topic #7:  woman appoint holocaust famili day ensur busi unit state american
Topic #8:  border countri law immigr illeg secur mexico unit state year
Topic #9:  american china peopl hong kong cancer year state world unit
Topic #10:  race sex order agenc feder state shall individu train child
Topic #11:  american america nation countri year thank state great peopl world

Top Topics in Aggressive Speeches

Semantic Coherence (Aggressivity)

Training LDA model for 2 topics...
Number of topics: 2, Coherence Score: 0.41836909439474695
Training LDA model for 5 topics...
Number of topics: 5, Coherence Score: 0.40381606014590804
Training LDA model for 8 topics...
Number of topics: 8, Coherence Score: 0.4081563907379523
Training LDA model for 11 topics...
Number of topics: 11, Coherence Score: 0.4211153278595033
Training LDA model for 14 topics...
Number of topics: 14, Coherence Score: 0.4059432468988091
Training LDA model for 17 topics...
Number of topics: 17, Coherence Score: 0.42100164166919735
Training LDA model for 20 topics...
Number of topics: 20, Coherence Score: 0.4120166602889251

WordCloud Top Topics: Agression

Topic Modeling: Simplicity

LatentDirichletAllocation(n_components=11, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1:  peopl think thing meet good lot know number big countri
Topic #2:  thank peopl countri want know great theyr think dont said
Topic #3:  dont want know said peopl say year theyr think right
Topic #4:  crowd number great happen mani weve thank big come tremend
Topic #5:  said think approv thing peopl militari magazin want know make
Topic #6:  presid trump said know want dont peopl year say great
Topic #7:  lawn mcconnel south spike trip juli phenomen mildli review staff
Topic #8:  percent charg countri think lot deal great tariff trade happen
Topic #9:  lawn mcconnel south spike trip juli phenomen mildli review staff
Topic #10:  meet mildli ohio pretti question land new hour folk coupl
Topic #11:  peopl thank great know said laughter right like year american

Top Topics In Linguistically Simple Speeches

Semantic Coherence (Simplicity)

Training LDA model for 2 topics...
Number of topics: 2, Coherence Score: 0.4120166602889251
Training LDA model for 5 topics...
Number of topics: 5, Coherence Score: 0.4120166602889251
Training LDA model for 8 topics...
Number of topics: 8, Coherence Score: 0.4120166602889251
Training LDA model for 11 topics...
Number of topics: 11, Coherence Score: 0.4120166602889251
Training LDA model for 14 topics...
Number of topics: 14, Coherence Score: 0.4120166602889251
Training LDA model for 17 topics...
Number of topics: 17, Coherence Score: 0.4120166602889251
Training LDA model for 20 topics...
Number of topics: 20, Coherence Score: 0.4120166602889251

WordCloud Top Topics: Simplicity

Conclusion

Summary

  • Moderate Inverse Relationship
  • Topics linked to aggression: Justice and order, fake news, China, and immigration
  • Topics linked to linguistic simplicity: His brother’s passing, patriotism, and policy

Future Research

  • More diverse selection of documents (tweets, statements made on social media, and transcriptions of video clips)

  • Comparative analysis of the 2020 versus 2024 presidential campaign: how has Trump’s rhetoric changed over time?

  • Use of LLMs, such as OpenAI API, to code aggression rather than the dictionary method.

  • Compare politicians from different parties (Republican versus Democrat)

Thank you!

Appendix

Monthly Average Aggression Ratio

american_words = [
    "abuse", "abysmal", "accusation", "accusations", "accuse", "accusing", "adversarial",
    "aggressive", "anger", "angered", "annoyance", "annoyed", "annoying", "antagonistic",
    "antagonize", "appalling", "archaic", "arrogance", "arrogant", "ashamed", "assault",
    "assaulted", "assaulting", "attacking", "atrocious", "backtalk", "bitter", "bitterly",
    "bitterness", "blackened", "blackmail", "blame", "blamed", "blaming", "blunder", "bogus",
    "botch", "botched", "betray", "betrayed", "betrayal", "clownery", "chaos", "chaotic",
    "complain", "complaining", "condemn", "confront", "confrontation", "confrontational",
    "crass", "coward", "cowardly", "criticize", "criticized", "criticizing", "cruel", "cruelty",
    "debase", "debased", "deceit", "deceived", "deceive", "deception", "devious", "deviousness",
    "despicable", "disgrace", "disgraceful", "disgusting", "dishonest", "dishonorable",
    "disregard", "disreputable", "distasteful", "dodgy", "dull", "embarrass", "embarrassing",
    "embarrassment", "fabricator", "fail", "failed", "failure", "failures", "faithless", "farcical",
    "fiasco", "fibber", "fiddle", "fiddled", "fool", "foolish", "fraud", "fraudulence",
    "fraudulent", "furious", "gimmick", "good-for-nothing", "groan", "grotesque", "hackery",
    "half-truths", "hate", "hatred", "hodgepodge", "horrendous", "hostile", "hostility",
    "humiliate", "humiliating", "hypocrisy", "hypocrite", "idiot", "idiotic", "ignorance",
    "ignorant", "ill-judged", "ill-mannered", "immoral", "inadequacy", "incapable", "inferior",
    "insult", "insulted", "insulting", "intolerant", "ironic", "irony", "irritated", "jumble",
    "laughable", "lawbreakers", "leech", "libelous", "ludicrous", "mess", "misbehave", "mischief",
    "mischievous", "mislead", "misleading", "needless", "needlessly", "neglect", "neglected",
    "neglectful", "negligent", "nonsense", "nonsensical", "nasty", "obnoxious", "offend",
    "offenders", "outrageous", "outraged", "patronize", "patronizing", "petty", "penny-pinching",
    "phony", "petulant", "prejudice", "prejudices", "predictable", "problematic", "provoke",
    "provoked", "ridicule", "ridiculous", "reprehensible", "rude", "scandal", "scandalous",
    "scapegoat", "scapegoats", "scaremonger", "scaremongering", "shady", "shameful", "shambles",
    "sham", "shenanigans", "short-sighted", "silly", "silliness", "slander", "slanderous",
    "sleaze", "sleazy", "sly", "slyness", "smokescreen", "sneaky", "spite", "spiteful", "steal",
    "stereotyping", "stubborn", "stupid", "stupidity", "subterfuge", "swindling", "tactic",
    "talking back", "trick", "trickery", "unacceptable", "unhelpful", "unnatural", "untrue",
    "undermine", "outrageous", "vindictive", "villain", "woeful", "wrong"
]
import pandas as pd
import json

# Path to your file
file_path = '/Users/KaylaMuller/desktop/text_analysis/week12/cleantext_DonaldTrump.jsonl.txt'

# Read the file line by line and parse each line as JSON
data = []
with open(file_path, 'r', encoding='utf-8') as f:
    for line in f:
        data.append(json.loads(line))

# Turn into a DataFrame
Trumpdf = pd.DataFrame(data)
import pandas as pd
import re

# Make sure your list of words is defined
word_list = set(american_words)  

# Compile a regex pattern that matches any of the words, word-boundary safe
pattern = re.compile(r'\b(' + '|'.join(re.escape(word) for word in word_list) + r')\b', re.IGNORECASE)

# Apply a function to count matches in each row
Trumpdf["NegativeWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(pattern.findall(text)))
Trumpdf["TotalWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(re.findall(r'\b\w+\b', text)))
Trumpdf["neg_ratio"] = Trumpdf["NegativeWordCount"] / Trumpdf["TotalWordCount"] * 100

# Ensure the 'Date' column is in datetime format
Trumpdf["Date"] = pd.to_datetime(Trumpdf["Date"], errors="coerce")

# Drop rows where 'Date' is NaT (invalid dates)
Trumpdf = Trumpdf.dropna(subset=["Date"])

# Extract YearMonth in string format (YYYY-MM) for easier handling in ggplot
Trumpdf["YearMonth"] = Trumpdf["Date"].dt.to_period('M').astype(str)

# Calculate the average 'neg_ratio' by 'YearMonth'
monthly_avg_neg_ratio = Trumpdf.groupby("YearMonth")["neg_ratio"].mean().reset_index()

# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)
library(ggplot2)

# Load the CSV file (make sure you have the correct path to the file)
df <- read.csv("monthly_avg_neg_ratio.csv")

# Convert 'YearMonth' to a date format
df$YearMonth <- as.Date(paste0(df$YearMonth, "-01"))

# Plot the data
ggplot(df, aes(x = YearMonth, y = neg_ratio)) +
  geom_line() +
  labs(title = "Monthly Average Aggression Ratio", x = "Month", y = "Aggression Ratio (%)") +
  theme_minimal()
# Sort by 'YearMonth' to ensure the rolling average works correctly
monthly_avg_neg_ratio = monthly_avg_neg_ratio.sort_values("YearMonth")

# Calculate the two-month rolling average of 'neg_ratio'
monthly_avg_neg_ratio["TwoMonthRollingAvg"] = monthly_avg_neg_ratio["neg_ratio"].rolling(window=2).mean()

# Export the result to CSV for use in R
monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio_with_rolling_avg.csv", index=False)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
monthly_avg_neg_ratio <- read_csv("monthly_avg_neg_ratio_with_rolling_avg.csv")

# Convert YearMonth to Date type
monthly_avg_neg_ratio <- monthly_avg_neg_ratio %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(monthly_avg_neg_ratio, aes(x = Date)) +
  geom_line(aes(y = neg_ratio), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = TwoMonthRollingAvg), color = "red", size = 1) +
  labs(title = "Monthly Negative Ratio with Two-Month Rolling Average",
       x = "Date",
       y = "Negative Ratio (%)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Analysis of Aggression in the 75th Percentile

# Subset the DataFrame to select only rows where 'neg_ratio' > 0.206258
subset_df = Trumpdf[Trumpdf["neg_ratio"] > 0.206258]

# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_neg_ratio = subset_df.groupby("YearMonth")["neg_ratio"].mean().reset_index()

# Export the result to CSV for use in R
subset_monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)
library(ggplot2)

# Load the CSV file (make sure you have the correct path to the file)
df_with_subset <- read.csv("monthly_avg_neg_ratio.csv")

# Convert 'YearMonth' to a date format
df_with_subset$YearMonth <- as.Date(paste0(df_with_subset$YearMonth, "-01"))

# Plot the data
ggplot(df_with_subset, aes(x = YearMonth, y = neg_ratio)) +
  geom_line() +
  labs(title = "Monthly Average Aggression Ratio for the 75th percentile", x = "Month", y = "Aggression Ratio (%)") +
  theme_minimal()

Monthly Average Flesch Score

from textstat import flesch_reading_ease

Trumpdf['flesch_score'] = Trumpdf['CleanText'].apply(flesch_reading_ease)

# Calculate the average 'flesch_score' by 'YearMonth'
monthly_avg_flesch_score = Trumpdf.groupby("YearMonth")["flesch_score"].mean()

# Export the result to CSV for use in R
monthly_avg_flesch_score.to_csv("monthly_avg_flesch_score.csv", index=True)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
monthly_avg_flesch_score <- read_csv("monthly_avg_flesch_score.csv")

# Convert YearMonth to Date type
monthly_avg_flesch_score <- monthly_avg_flesch_score %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(monthly_avg_flesch_score, aes(x = Date)) +
  geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = flesch_score), color = "red", size = 1) +
  labs(title = "Monthly Average Flesch Score",
       x = "Date",
       y = "Flesch Score") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Analysis of Flesch Score Above the 75th Percentile

# Subset the DataFrame to select only rows where 'flesch_score' > 68.72
subset_df_flesch_score = Trumpdf[Trumpdf["flesch_score"] > 68.72]

# Calculate the average 'neg_ratio' by 'YearMonth'
subset_monthly_avg_flesch_score = subset_df_flesch_score.groupby("YearMonth")["flesch_score"].mean().reset_index()

# Export the result to CSV for use in R
subset_monthly_avg_flesch_score.to_csv("subset_monthly_avg_flesch_score.csv", index=True)
library(ggplot2)
library(readr)
library(dplyr)

# Read the data
subset_monthly_avg_flesch_score <- read_csv("/Users/KaylaMuller/Desktop/text_analysis/week12/subset_monthly_avg_flesch_score.csv")

# Convert YearMonth to Date type
subset_monthly_avg_flesch_score <- subset_monthly_avg_flesch_score %>%
  mutate(Date = as.Date(paste0(YearMonth, "-01")))

# Plot with ggplot
ggplot(subset_monthly_avg_flesch_score, aes(x = Date)) +
  geom_line(aes(y = flesch_score), color = "blue", linetype = "dashed", size = 1) +
  geom_line(aes(y = flesch_score), color = "red", size = 1) +
  labs(title = "Monthly Average Flesch Score for the 75th Percentile",
       x = "Date",
       y = "Flesch Score") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "1 month")

Topic Modeling: Aggression in the 75th Percentile

import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer

# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df = subset_df.copy()

# Setup
stop = set(stopwords.words('english'))
stop.add('applause')  # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Combined cleaning function
def clean_text(text):
    text = text.lower()  # lowercase
    text = text.translate(str.maketrans('', '', string.punctuation))  # remove punctuation
    text = re.sub(r'\d+', '', text)  # remove numbers
    tokens = word_tokenize(text)  # tokenize
    tokens = [word for word in tokens if word not in stop]  # remove stopwords
    tokens = [lemmatizer.lemmatize(word) for word in tokens]  # lemmatization
    tokens = [stemmer.stem(word) for word in tokens]  # stemming
    return ' '.join(tokens)

# Apply to DataFrame
subset_df['CleanText_transformed'] = subset_df['CleanText'].apply(clean_text)
# Vectorize
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_df=0.9, min_df=2, stop_words='english')  # stop_words optional now
dtm = vectorizer.fit_transform(subset_df['CleanText_transformed'])
from sklearn.decomposition import LatentDirichletAllocation

lda = LatentDirichletAllocation(n_components=7, random_state=42)  # change 5 to desired topic count
lda.fit(dtm)
def display_topics(model, feature_names, num_top_words):
    for idx, topic in enumerate(model.components_):
        print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
        
display_topics(lda, vectorizer.get_feature_names_out(), 10)
topic_results = lda.transform(dtm)
subset_df['DominantTopic'] = topic_results.argmax(axis=1)

WordClouds Representing Top Topics: AGGRESSION

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()

# Loop over each topic
for topic_idx, topic_weights in enumerate(lda.components_):
    # Create dictionary: word -> weight
    word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]}  # top 30 words
    
    # Generate the word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
    
    # Plot the word cloud
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.title(f"Topic #{topic_idx + 1}")
    plt.show()

Topic Modeling: Simplicity in the 75th Percentile

import string
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer

# Step 0: Optional — Make a copy to avoid SettingWithCopyWarning
subset_df_flesch_score = subset_df_flesch_score.copy()

# Setup
stop = set(stopwords.words('english'))
stop.add('applause')  # custom stopword
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Combined cleaning function
def clean_text(text):
    text = text.lower()  # lowercase
    text = text.translate(str.maketrans('', '', string.punctuation))  # remove punctuation
    text = re.sub(r'\d+', '', text)  # remove numbers
    tokens = word_tokenize(text)  # tokenize
    tokens = [word for word in tokens if word not in stop]  # remove stopwords
    tokens = [lemmatizer.lemmatize(word) for word in tokens]  # lemmatization
    tokens = [stemmer.stem(word) for word in tokens]  # stemming
    return ' '.join(tokens)

# Apply to DataFrame
subset_df_flesch_score['CleanText_transformed'] = subset_df_flesch_score['CleanText'].apply(clean_text)
# Vectorize
from sklearn.feature_extraction.text import CountVectorizer

vectorizer2 = CountVectorizer(max_df=0.9, min_df=2, stop_words='english')  # stop_words optional now
dtm2 = vectorizer.fit_transform(subset_df_flesch_score['CleanText_transformed'])
from sklearn.decomposition import LatentDirichletAllocation

lda2 = LatentDirichletAllocation(n_components=6, random_state=42)  # change 5 to desired topic count
lda2.fit(dtm2)
def display_topics(model, feature_names, num_top_words):
    for idx, topic in enumerate(model.components_):
        print(f"Topic #{idx + 1}: ", " ".join([feature_names[i] for i in topic.argsort()[:-num_top_words - 1:-1]]))
        
display_topics(lda2, vectorizer.get_feature_names_out(), 10)
topic_results2 = lda2.transform(dtm2)
subset_df_flesch_score['DominantTopic'] = topic_results.argmax(axis=1)

WordClouds Representing Top Topics: SIMPLICITY

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Get the feature names (words)
feature_names = vectorizer.get_feature_names_out()

# Loop over each topic
for topic_idx, topic_weights in enumerate(lda2.components_):
    # Create dictionary: word -> weight
    word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]}  # top 30 words
    
    # Generate the word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)
    
    # Plot the word cloud
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.title(f"Topic #{topic_idx + 1}")
    plt.show()